41 research outputs found

    Learning language through pictures

    Full text link
    We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Mimicking an important aspect of human language learning, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.Comment: To appear at ACL 201

    Revisiting the Hierarchical Multiscale LSTM

    Full text link
    Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its applicability. We provide a detailed reproduction and ablation study of the architecture, shedding light on some of the potential caveats of re-purposing complex deep-learning architectures. We further show that simplifying certain aspects of the architecture can in fact improve its performance. We also investigate the linguistic units (segments) learned by various levels of the model, and argue that their quality does not correlate with the overall performance of the model on language modeling.Comment: To appear in COLING 2018 (reproduction track

    Lessons learned in multilingual grounded language learning

    Full text link
    Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate in detail which conditions affect the performance of this type of grounded language learning model. We show that multilingual training improves over bilingual training, and that low-resource languages benefit from training with higher-resource languages. We demonstrate that a multilingual model can be trained equally well on either translations or comparable sentence pairs, and that annotating the same set of images in multiple language enables further improvements via an additional caption-caption ranking objective.Comment: CoNLL 201

    Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

    Full text link
    Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author's text. Our research proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither data nor target models are accessible. We introduce a transformer-based extension of a lexical replacement attack, and show it achieves high transferability when trained on a weakly labeled corpus -- decreasing target model performance below chance. While not completely inconspicuous, our more successful attacks also prove notably less detectable by humans. Our framework therefore provides a promising direction for future privacy-preserving adversarial attacks.Comment: Accepted to EACL 202

    NeuralREG: An end-to-end approach to referring expression generation

    Full text link
    Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction. Using a delexicalized version of the WebNLG corpus, we show that the neural model substantially improves over two strong baselines. Data and models are publicly available.Comment: Accepted for presentation at ACL 201

    On the difficulty of a distributional semantics of spoken language

    Get PDF
    In the domain of unsupervised learning most work on speech has focused on discovering low-level constructs such as phoneme inventories or word-like units. In contrast, for written language, where there is a large body of work on unsupervised induction of semantic representations of words, whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of the semantics of spoken language becomes feasible if we abstract from the surface variability. We simulate this setting with a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we present inconclusive results on human speech, and discuss the challenges inherent in learning distributional semantic representations on unrestricted natural spoken language

    The C-terminal domain of the 2b protein of Cucumber mosaic virus is stabilized by divalent metal ion coordination

    Get PDF
    The main function of the 2b protein of Cucumber mosaic virus (CMV) is binding permanently the double stranded siRNA molecules in the suppression process of post-transcriptional gene silencing (PTGS). The crystal structure of the homologue Tomato aspermy virus (TAV) 2b protein is known, but without the C-terminal domain. The biologically active form is a tetramer: four 2b protein molecules and two siRNA duplexes. Regarding the complete 2b protein structure, we performed a molecular dynamics (MD) simulation of the whole siRNA–2b ribonucleoprotein complex. Unfortunately, the C-terminal domain is proved to be partially unstructured. Multiple sequence alignment showed a well conserved motif between residues 94 and 105. The negatively charged residues of the C-terminal domain are supposed to take part in coordination of a divalent metal ion and stabilize the three-dimensional structure of the C-terminal domain. MD simulations were performed on the detached C-terminal domains (aa 65–110). 0.15 M MgCl2, CaCl2, FeCl2 and ZnCl2 salt concentrations were used in the screening simulations. Among the tested divalent metal ions Mg2+ proved to be very successful because Asp95, Asp96 and Asp98 forms a quasi-permanent Mg2+ binding site. However the control computations have resulted in any (at least) divalent metal ion remains in the binding site after replacement of the bound Mg2+ ion. A quadruple mutation (Rs2DDTD/95–98/AAAA) was introduced into the position of the putative divalent metal ion binding site to analyze the biological relevance of molecular modeling derived hypothesis. The plant inoculation experiments proved that the movement of the mutant virus is slower and the symptoms are milder comparing to the wild type virus. These results demonstrate that the quadruple mutation weakens the stability of the 2b protein tetramer–siRNA ribonucleoprotein complex

    Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

    Full text link
    Recent work has highlighted the advantage of jointly learning grounded sentence representations from multiple languages. However, the data used in these studies has been limited to an aligned scenario: the same images annotated with sentences in multiple languages. We focus on the more realistic disjoint scenario in which there is no overlap between the images in multilingual image--caption datasets. We confirm that training with aligned data results in better grounded sentence representations than training with disjoint data, as measured by image--sentence retrieval performance. In order to close this gap in performance, we propose a pseudopairing method to generate synthetically aligned English--German--image triplets from the disjoint sets. The method works by first training a model on the disjoint data, and then creating new triples across datasets using sentence similarity under the learned model. Experiments show that pseudopairs improve image--sentence retrieval performance compared to disjoint training, despite requiring no external data or models. However, we do find that using an external machine translation model to generate the synthetic data sets results in better performance.Comment: 10 page
    corecore